Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs
نویسندگان
چکیده
We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.
منابع مشابه
A Graph Based Authorship Identification Approach: Notebook for PAN at CLEF 2015
The paper describes our approach for the Authorship Identification task at the PAN CLEF 2015. We extract textual patterns based on features obtained from shortest path walks over Integrated Syntactic Graphs (ISG). Then we calculate a similarity between the unknown document and the known document with these patterns. The approach uses a predefined threshold in order to decide if the unknown docu...
متن کاملA Ranking Based Model for Automatic Image Annotation in a Social Network
We propose a relational ranking model for learning to tag images in social media sharing systems. This model learns to associate a ranked list of tags to unlabeled images, by considering simultaneously content information (visual or textual) and relational information among the images. It is able to handle implicit relations like content similarities, and explicit ones like friendship or author...
متن کاملSyntactic Stylometry: Using Sentence Structure for Authorship Attribution
Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...
متن کاملAutomatic Verb Classiication Using Multilingual Resources
We propose the use of multilingual corpora in the automatic classiication of verbs. We extend the work of (Merlo and Stevenson, 2001), in which statistics over simple syntactic features extracted from textual corpora were used to train an automatic classiier for three lexical semantic classes of English verbs. We hypothesize that some lexical semantic features that are diicult to detect superrc...
متن کاملAutomatic Verb Classi cation Using Multilingual Resources
We propose the use of multilingual corpora in the automatic classiication of verbs. We extend the work of (Merlo and Stevenson, 2001), in which statistics over simple syntactic features extracted from textual corpora were used to train an automatic classiier for three lexical semantic classes of English verbs. We hypothesize that some lexical semantic features that are diicult to detect superrc...
متن کامل